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The loss of the tail is among the most notable anatomical changes to have occurred 
along the evolutionary lineage leading to humans and to the ‘anthropomorphous 
, witha proposed role in contributing to human bipedalism* ©. Yet, the genetic 


mechanism that facilitated tail-loss evolution in hominoids remains unknown. Here 
we present evidence that an individual insertion of an Alu element in the genome of 
the hominoid ancestor may have contributed to tail-loss evolution. We demonstrate 
that this Alu element—inserted into an intron of the TBXT gene’ °—pairs witha 
neighbouring ancestral Alu element encoded in the reverse genomic orientation and 
leads toa hominoid-specific alternative splicing event. To study the effect of this 
splicing event, we generated multiple mouse models that express both full-length and 
exon-skipped isoforms of Tbxt, mimicking the expression pattern of its hominoid 
orthologue 7BXT. Mice expressing both 7bxt isoforms exhibit a complete absence of 
the tail ora shortened tail depending on the relative abundance of Tbxt isoforms 
expressed at the embryonic tail bud. These results support the notion that the 
exon-skipped transcript is sufficient to induce a tail-loss phenotype. Moreover, mice 
expressing the exon-skipped 7bxt isoform develop neural tube defects, a condition 
that affects approximately 1 in 1,000 neonates in humans”. Thus, tail-loss evolution 
may have been associated with an adaptive cost of the potential for neural tube 
defects, which continue to affect human health today. 


The tail appendage varies widely in its morphology and function across 
vertebrate species**. For primates in particular, the tailis adaptedtoa 
range of environments, withimplications for the style of locomotion of 
the animal”””. The New World howler monkeys, for example, evolved a 
prehensile tail that helps with the grasping or holding of objects while 
occupying arboreal habitats”. Hominoids—which include humans and 
the apes—however, lost their external tail during evolution. The loss of 
the tail is inferred to have occurred around 25 million years ago when 
the hominoid lineage diverged from the ancient Old World monkeys 
(Fig. 1a), leaving only 3-5 caudal vertebrae to form the coccyx, or tail- 
bone, in modern humans". 

It has long been speculated that tail loss in hominoids contributed 
to orthograde and bipedal locomotion, the evolutionary occurrence of 
which coincided with the loss of the tail’*”. Yet, the genetic mechanism 
that facilitated either tail-loss evolution or orthograde and bipedal 
locomotion in hominoids remains unknown. Recent progress in primate 
genome sequencing projects have made it possible to infer causal links 
between genotypic and phenotypic changes’*”°, and have enabled the 
search for hominoid-specific genetic elements that control tail develop- 
ment”. Moreover, developmental genetics studies of vertebrates have 


led to the elucidation of the gene regulatory networks that underlie 
tail development”. For example, the Mouse Genome Informatics 
(MGI) database includes more than 100 genes identified from natu- 
ral mutants and induced mutagenesis studies relating to the absence 
or shortening of the tail phenotype”? (Supplementary Data 1 and 
Methods). Expression of these genes, including the core factors for 
inducing mesoderm and definitive endoderm suchas Tbxt (also called T 
or Brachyury), Wnt3a and Msgni, is enriched in the development of the 
primitive streak and posterior body formation. Although perturbations 
of these genes may lead to the shortening or complete absence of the 
tail, the causal genetic changes that drove the evolution of tail-loss in 
hominoids remains unknown. Understanding the genetics of tail loss 
in hominoids may provide insight into the evolutionary pressure that 
led to human traits such as bipedalism. 


Ahominoid-specific intronic Aluy in TBXT 

With the goal of identifying genetic variants associated with the loss 
of the tail in hominoids, we initially screened 31 human genes—and 
their primate orthologues—for which mutations are associated with 
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Fig. 1| Evolution of tail loss in hominoids. a, Tail phenotypes across the primate 
phylogenetic tree. Ma, millions of years ago. b, UCSC Genome browser view” 
of the conservation score through multi-species alignment at the TBXT locus 
across primate genomes. Exon numbering of human 7BXT follows a conventional 
order across species without including the 5’ untranslated region exon. The 


the absence of an external tail (MGI database annotation ‘absent tail’; 
Supplementary Data 1 and Methods). We first examined protein 
sequence conservation between the hominoid genomes and their 
closest sister lineage, the Old World monkeys (Cercopithecidae). 
Failing to detect candidate variants in the coding regions of this gene 
set, we expanded the search in two ways: (1) adding 109 genes for 
which mutation in their mouse orthologues includes tail-reduction 
phenotypes annotated in the MGI as ‘vestigial tail’ or ‘short tail’; 
and (2) systematically screening for hominoid-specific variants in 
the entire gene region and their 10 kb upstream and downstream 
sequences (Supplementary Data 1 and Methods). Together, we 
detected 85,064 single nucleotide variants (SNVs), 5,533 deletions 
and 13,820 insertions that are hominoid-specific (Extended Data Fig. 1 
and Supplementary Data 1-4). Among these changes, we identified 
nine protein-sequence altering variants—seven missense variants and 
two in-frame deletions—with predicted impacts on function (Sup- 
plementary Data1and Methods). However, these variants originated 
from genes that after perturbation influence more general growth 
and developmental defects as opposed to specifically tail-reduction 
phenotypes (Supplementary Data 1). Although we were not able to 
exclude the possibility that these variants might have contributed to 
tail-loss evolution in hominoids, we did not find additional supporting 
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hominoid-specific AluY element is highlighted in red. LINE, long interspersed 
nuclear element; LTR, long terminal repeat; SINE, short interspersed nuclear 
element. c, Schematic of the proposed mechanism of tail-loss evolution in 
hominoids. Primate images inaand c were created using BioRender (https:// 
biorender.com). 


evidence to prioritize their experimental validation as a plausible 
genetic mechanism. 

Examining non-coding hominoid-specific variants among the genes 
related to tail development (Methods), we recognized an Alu elementin 
the sixth intron of the hominoid 7BXT gene’® (Fig. 1b). This element had 
the following notable combination of features: (1) ahominoid-specific 
phylogenetic distribution; (2) presence ina gene known for its involve- 
ment in tail formation; and (3) proximity and orientation relative to 
a neighbouring Alu element. First, this particular hominoid-specific 
Alu element is from the AluY subfamily, a relatively ‘young’ but not 
human-specific subfamily shared among the genomes of hominoids 
and Old World monkeys. Moreover, the inferred insertion time—given 
the phylogenetic distribution (Fig. 1a)—coincides with the evolutionary 
period when early hominoids lost their tails”. Second, TBXT encodes 
a highly conserved transcription factor crucial for mesoderm and 
definitive endoderm formation during embryonic development’. 
Heterozygous mutations in the coding regions of TBXT orthologues in 
tailed animals such as mouse’”, Manx cat”’, dog®’ and zebrafish” lead 
tothe absence or reduced forms of the tail, and homozygous mutants 
are typically non-viable. 

Third, we inferred that the AluY insertion may mediate an alterna- 
tive splicing (AS) event of the hominoid 7BXT in an unusual way. This 
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AluY element is not inserted in the vicinity of a splice site; instead, 
it is >500 bp from exon 6 of 7BXT, the nearest coding exon (Fig. 1b). 
As such, it would not be expected, by itself, to lead to an AS event, as 
found for other individual intronic Alu elements near exon boundaries 
that directly affect splicing **. However, we noted the presence of 
another Alu element (AluSx1) in the reverse orientation inintron 5 of 
TBXT that is shared among all monkeys and apes (simians). Together, 
the AluY and AluSx1 elements form an exon-flanking inverted repeat 
pair (Fig. 1b). We therefore posited that during transcription, 
the hominoid-specific AluY element pairs with the simian-shared 
AluSx1 element to form a stem-loop structure in TBXT pre-mRNA 
and traps exon 6 in the loop (Fig. 1c). Aninferred model of the RNA 
secondary structure supported the interaction between these two 
Aluelements® (Extended Data Fig. 2). The secondary structure of the 
transcript may conjoin the splice donor and receptor site of exons 5 
and 7, respectively, and promote the skipping of exon 6, thereby 
leading to a hominoid-specific and in-frame AS isoform: TBXT4®?"° 
(Fig. 1c). We validated the existence of the TBXT*”°”* transcript in 
humans and its corresponding absence in mice, which lacks both 
Alu elements, using a system for embryonic stem (ES) cell in vitro 
differentiation that induces TBXT expression similar to that present 
in the primitive streak of the embryo**”’ (Extended Data Fig. 3a—d and 
Supplementary Table 1). Considering the high conservation of TBXT 
exon 6 andits potential transcriptional regulation function but not the 
DNA-binding function®** (Extended Data Fig. 3e,f), we proposed that 
the AluY-insertion-induced TBXT(Aexon6) isoform protein disrupts 
tail elongation during embryonic development, which then contrib- 
utes to the reduction or loss of an external tail (Fig. 1c). 


AluY insertion in TBXT induces AS 


To test whether AluY—and its interacting counterpart AluSx1—are 
both required to induce the hominoid-specific AS of TBXT, we used 
CRISPR-Cas9 tool to generate human ES cell lines that individually 
deleted the hominoid-specific AluY element or the AluSx1 element 
(Fig. 2a, Extended Data Fig. 4a and Supplementary Tables 2-4). We 
adapted the human ES cell in vitro differentiation system to mimic the 
expression of TBXT in the embryo”* (Extended Data Fig. 3a). Deleting 
AluY almost completely eliminated the generation of the TBXT*°"’ 
isoform transcript (Fig. 2b, middle). Similarly, deleting the interact- 
ing partner AluSx]1 was sufficient to repress this alternatively spliced 
isoform (Fig. 2b, right). These results support the notion that the 
hominoid-specific AluY insertion induces anew TBXT**°’ AS isoform 
through an interaction with the neighbouring simian-shared AluSx1 
element (Fig. 2c, top). 

Notably, wild-type differentiated human ES cells also expressed 
a minor, previously un-annotated transcript that excludes both 
exons 6 and 7, which led to a frameshift and early truncation at the 
protein level (Fig. 2b, left, and Extended Data Fig. 4b). Whereas delet- 
ing AluY slightly enhanced the abundance of this TBXT**°"*” tran- 
script, deleting AluSx1in intron 5 eliminated this transcript (Fig. 2b). 
This result may be best explained by a secondary interaction of the 
AluSx1 element with a distal and inverted AluSq2 element in intron 7. 
In this scenario, the secondary interaction would occur at a lower 
probability than the AluY—AluSx]1 interaction pair (Fig. 2c, bottom). 
It is noteworthy that the distance between the AluSx1-AluY pair is 
substantially shorter (1,448 bp) than the AluSx1-AluSq2 distance 
(4,188 bp). Furthermore, the nascent transcript would favour forma- 
tion of the former structure as there is a time period during which 
the AluSx1-AluY structure can form and the distal structure cannot; 
these factors could potentially explain the preferred formation of the 
Aexon6 mRNA over Aexon6-7 mRNA. These results provide further 
support to indicate that the interaction between intronic transpos- 
able elements induces AS of a key developmental transcriptional 
factor gene: TBXT (Fig. 2c). 
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Fig. 2|Both AluY and AluSx1are required for inducing alternative splicing 
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The TBXT**°"*” transcript may stem from an AluSx1-AluSq2 interaction. 


Tbxt**"’ expression induces tail loss 


To test whether the TBXT““°”’ isoform is sufficient to induce tail loss, 
we first used zygotic CRISPR targeting to generate a heterozygous 
mouse model (Tbxt**°"*") that simultaneously expresses the Thx" 
transcript and its full-length transcript (Fig. 3a,b, Extended Data Fig. 5a 
and Methods). TBXTis highly conserved in vertebrates, and human 
and mouse protein sequences share 91% identity with a similar exon 
and intron architecture’. We reasoned that we could simulate a dexon6 
isoform by deleting exon 6 in mouse 7bxt and force the splicing of 
exon 5 to exon 7 (Fig. 3b,c). The Thxt*”°"* mouse therefore provides 
a model of the expression of 7BXT in humans, which expresses both 
full-length and Aexon6 isoforms (Figs. 2b and 3b,c). 

The phenotypes of Thxt**°"** mice exhibited strong but heteroge- 
neous tail morphologies, including no-tail and short-tail phenotypes 
(Fig. 3d,e and Extended Data Fig. 5b,c). Specifically, 21 out of the 63 
heterozygous mice showed tail phenotypes, whereas none of their 
35 wild-type littermates showed phenotypes (Table 1). The incom- 
plete penetrance of phenotypes among the heterozygotes was stable 
across generations and founder lines: no-tail or short-tailed (Thxt4”"™) 
parents gave birth to long-tailed Thxt4”°"* mice, whereas long-tailed 
(Tbxt°"*") parents gave birth to mice with varied tail phenotypes 
(Table 1 and Extended Data Fig. 5b,c). These results provide further 
evidence that the presence of TBXT**°”’ is sufficient to induce tail loss. 
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vertebrae; WT, wild type; arrowheads highlight differences in tail phenotypes. 


To control for the possibility that zygotic CRISPR targeting 
induced off-targeting DNA changes at the Tbxt locus, we performed 
Capture-seq®’ covering the 7bxt locus and about 200 kb of both 
upstream-flanking and downstream-flanking regions (Extended Data 
Fig. 5d,e). Capture-seq did not detect any off-target mutations at the 
Tbxt locus across three independent founder mice, which supports 
our conclusion that the observed tail phenotype in Thxt4*°"™* mice 
was derived from the Tbxt*’°” genotype. 


Inserting intronic sequences in mouse Tbxt 


Although the heterozygous mouse model (Tbxt*”°"*") showed that 
expression of both full-length and Aexoné6 splice isoforms can produce 
atail-loss phenotype, it does not assess whether AS is the mechanism for 
its generation. We therefore sought to test whether AS in human 7BXT 
induced by the pairing of AluY and AluSx1 can be recapitulated in mouse 
Tbxt, and whether such a genetic change induces tail phenotypes. 

To that end, we first generated two mouse ES cell lines with Tbxt 
modifications, including simultaneously inserting the human AluY 
and AluSx1 elements into introns 6 and 5 of Tbxt, respectively, and 
inserting areverse complementary sequence (RCS) from Tbxtintron 5 
into intron 6 (Extended Data Fig. 6 and Supplementary Tables 2-4). 
For the first model, we simultaneously inserted the AluY and AluSx1 
elements into mouse 7bxt (henceforth referred to as Tbxt"”) in an 
exon 6-flanking configuration that is analogous to the gene struc- 
ture inhuman 7BXT (Extended Data Fig. 6a). We designed a two-step 


strategy by first inserting two Alu elements together with a selection 
gene cassette of a puromycin-resistance gene and a truncated thymi- 
dine kinase gene (puro-ATK), flanked by loxPrecombination motifs, for 
both positive selection and counter selection, respectively (Extended 
Data Fig. 6a). Following the identification of mouse ES cell clones with 
homozygous integration of the full construct, the selection gene cas- 
sette was removed by transiently expressing Cre recombinase in the 
selected clones through ATk-based counter selection (Extended Data 
Fig. 6a and Methods). 

For the second mouse ES cell line, we adopted the same strategy 
but selected a 297 bp sequence endogenous to 7bxt intron 5—the 
same length as the human AluY—and then inserted its RCS into Tbxt 
intron 6, thus forming an inverted sequence pair like the AluSx1—-AluY 
pair (referred as Tbxt'"*©; Extended Data Fig. 6b). We confirmed that 
both Thxtirs94vinsAsaY and Thxtir 8m ES cells expressed the Thx" 
splicing isoform after differentiation (Extended Data Fig. 6c). Nota- 
bly, the Thxtivs®-/"s8CS ES cells expressed a higher percentage of 
Tbxt*”°"* transcripts relative to the full-length transcript than that of 
Thx tirssavinsASAY ES cells (Extended Data Fig. 6c). This result could 
be attributed to the sequence context difference and the higher 
sequence identity in the Tbxt'"’© stem structure (297 out of 297 iden- 
tical) thanin the Thxt’"*“” stem structure (228 out of 297). Together, 
these results demonstrate that the exon-skipping event caused by 
inverted Alu pairs flanking an exon do not require any specific Alu 
sequences, but can be caused by inverted sequence pairs of a com- 
pletely different sequence. 


Abundance of Tbxt isoforms explains tail phenotypes 


Next, we aimed to generate mouse models that incorporate the engi- 
neered Tbxt'’"™” and Tbxt'"’© gene structures to study their tail 
phenotypes (Extended Data Fig. 6d and Methods). Through multiple 
experimental trials, we successfully generated one Tbxt'"™“” mouse line 
(Fig. 4a) but failed to derive any Tbxt'"© mouse lines. Instead, we seren- 
dipitously obtained another mouse line—henceforth called Thx"? 
that had an inserted 220 bp sequence from intron 6 into intron 5 of 
Tbxt, thereby resembling the Tbxt*®S design through forming a RCS 
pair flanking exon 6 (Fig. 4b, Extended Data Fig. 7a—-c and Methods). 
Neither heterozygous nor homozygous Tbxt'"“**” mice showed obvious 
tail phenotypes in adulthood (Fig. 4c). However, homozygous Tbxt"**°? 
mice (Thxtins®7/"s®°S2) consistently had around 10% shorter tails relative 
to wild-type or heterozygous mice (Fig. 4d). 

To gain insight into the distinct tail phenotypes in Tbx and 
Tbxt'"*“? mice, we collected tailbud RNA samples from embryonic 
stage 10.5 (E10.5) embryos, when Tbxt is anticipated to influence 
tail development. Specifically, we processed RNA samples from 
litter-controlled wild-type, heterozygous and homozygous mice 
from intercrossed breeding pairs using heterozygous mice, followed 
by PCR with reverse transcription (RT-PCR) analyses of the expres- 
sion patterns of Tbxt isoforms (Fig. 4e and Extended Data Fig. 7d). 
Tbxti"S4” homozygous embryos expressed low levels of Thx4”°” tran- 
script relative to the full-length transcript (Fig. 4e, left). By contrast, 
Thx tirs®52/nskCS2 embryos expressed higher levels of the Tbxt*”°”* tran- 
script than the 7bxt full-length transcript (Fig. 4e, right). As expected, 
in bothlines, heterozygous embryos expressed lower levels of Thxt*”?”® 
transcript than their genotype-matched homozygous mice (Fig. 4e). 
These results suggest that the tail-length phenotype in Tbxt'"“” and 
Tbxt'"*“ mice can be explained by the relative abundance of Tbxt4“°"’ 
and Tbxt full-length transcripts. 

Itis important to note that the Tbx: mice expressed amuch 
lower relative abundance of the Tbxt*”°” isoform in the E10.5 embry- 
onic tailbud than that observed in the corresponding in vitro differenti- 
ated mouse ES cells modelling primitive streak cells of E6.5 embryos 
(Fig. 4e, left, and Extended Data Fig. 6c). Although it remains unclear 
why this difference occurred, it may relate to differential splicing 
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Table 1| Genotype and phenotype analyses of the F, mice generated from intercrossing Tbxt“°"™* parents 


Genotype Totalno. No.ofmicewith Tail phenotype Intercross(type1)* _Intercross (type 2)? 
FE yinics: “Fall phenotyns. otal Short tail Kinked tail 

Tbxiteonsteans 0 fe) 0 0 0 fe) 0 

Thxt0n6/* 63 1 4 9 8 17(7)P 46 (14)° 

Tbxt'* 35 0 (0) (0) (0) 70) 28 (0)? 


Note that tail phenotypes were categorized into no tail, short tail, kinked tail and long tail, as exemplified in Fig. 3e. 
*For type intercrossing, at least one of the parent mice has no tail or is short-tailed. For type2 intercrossing, both parent mice are long-tailed. 


’Numbers in parentheses indicate the total number of mice with tail phenotypes. 


regulation in different cell types*°. Consequently, the fact that our 
Alu-pair insertion model did not express high levels of the Thxt*”?"* 
transcriptinthe embryonic tailbud renders this particular mouse model 
as inconclusive, beyond the insight that small amounts of this isoform 
are insufficient to lead to tail loss. 

Having noted that the relative abundance of the Thx“ transcript 
is important for regulating tail length, we next aimed to generate mice 
with further increased relative abundance of the Tbxt**°” transcript. 
To doso, we crossed Tbxt*”°"* heterozygous mice with the Thxti"s*©? 
mice. Notably, all 19 compound heterozygous mice (Thxtit®©7420n6) 
presented a complete absence of an external tail (Fig. 4f and Table 2). 
This phenotype was validated through multiple litters of mice 


generated from breeding pairs between different Thxt4*°" founder 
lines and Tbxt"*“” mice of both heterozygotes (Tbx"**“*) and homozy- 
gotes (Thxtir®7/nskcs2) (Table 2). Moreover, Tbhxti7®740"6 mice consti- 
tuted less than the expected 50% among the offspring from breeding 
Thx 8" and Thxtinsk©/"skC2 mice pairs, which indicated that some 
Thx tise 52/4ex0"6 embryos may not survive through development. Thus, 
although the exon 6-deletion heterozygotes (Thxt4°"") exhibited 
incomplete penetrance of tail phenotypes, when crossed with the 
Tbxt"**“” allele, the phenotype was strong, which suggests that produc- 
tion of the tail depends ona minimal abundance of the Tbxt full-length 
isoform. Alternatively, suppression of tail development depends ona 
higher-than-threshold abundance of Tbxt*®°” transcript (Fig. 4f and 
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Fig. 4| Introducing inverted intronic sequence pairs induces short-tail 
phenotypes in mouse models. a, Schematic of the mouse 7bxt gene structure 
with the inserted human AluSx1-Aluy pair (Tbxt'"”), The engineering of the 
Tbxti"%4" model involved a two-step strategy specified in Extended Data Fig. 7a 
(Methods). b, Gene structure of the Tbxti"**“? model with an insertion of a220 bp 
RCS from intron 6 to intron 5 of Tbxt (Methods). c,d, Tail length of Thxti"54” 
mice (c) and Tbxti"s®“? mice (d) across age, grouped by sex and genotypes. 
Tbxt’” is the wild type. Data inc and dare presented as the mean +s.d. of tail 
length (mm) in the corresponding group. Each mouse group included 4-11 
mice from multiple litters, with dots indicating individual data points of the 
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group.e, Tailbud-expressed 7bxt transcripts detected by RT-PCR using E10.5 
mouse embryos across genotypes from Tbxt?™*" (left) or Thxt"*- (right) 
intercrossing experiments. RT-PCR results are presented as biological 
duplicates, with consistent results obtained from more independent embryos 
across genotypes. f, Representative tail phenotypes across mouse lines, 
including wild type, Tbxtinsssavinsasay Thx rinsRES2/insRCS2 9 9 Thx tinsks2/sexon6 Fach 
included both male (M) and female (F) mice. g, Summary schematic of the 
correspondence between the relative abundance of Tbxt isoforms in mice of 
different genotypes and their observed tail phenotypes. 


Table 2 | Genotype and phenotype analyses of Tbxt*”°"’ and 
Tbxti"*®°S? breeding results 


Breedingtype Offspring Totalno. Tail phenotypes 
genotypes ped Enron d Notail Short Kinked Long 
tail tail tail 
ThxtinsCS2/* 5 Thxtinskes2/hexon6 7 7 (2/2 Oo fe) O 
ee Thxton 4 0 0 1 10 
Thxt®°S2* or 12 (e) fe) ) 12 
Tbxt’”* 
Tbxtins®es2/inskcs2 x Tbxtinskes2/dexoné 12 12 (4) fe) (@) fe) 
Thxttoore" Thxt"*s2* 39 0 0 0 30 


Note that tail phenotypes were categorized into no tail, short tail, kinked tail and long tail, as 
exemplified in Fig. 3e. 

*Numbers in parentheses indicate 2 mice out of the 7, and 4 out of the 12, were runts and died 
between 1 and 3months after birth. 


Table 2). Together, these results demonstrate that the relative abun- 
dance of Tbxt isoforms is important for regulating tail development. 


Homozygous removal of Tbxt exon 6is lethal 


Our mouse work enabled us to study tail phenotypes in mutant 
mice across different relative abundance of the 7bxt full-length and 
Tbxt*°"* transcripts. We observed a correspondence between mice 
that expressed a higher abundance of the 7bxt full-length transcript 
with longer tail phenotypes, and mice with short-tail or no-tail pheno- 
types that expressed a higher abundance of the Tox” transcript 
(Fig. 4g). To study the extreme case in which the mice only express 
Tbxt*°"* transcript and not the 7bxt full-length transcript, we inves- 
tigated the developmental phenotypes of Thxt4*°"9/4"> mice through 
intercrossing experiments. Intercrossing Thx" mice across mul- 
tiple litters—and replicated in different founder lines—we failed to 
produce viable homozygotes (Table 1). Dissecting intercrossed stage 
E11.5 embryos showed that homozygotes either had arrested develop- 
ment at approximately stage E9 or developed with spinal cord malfor- 
mations that consequently led to death at birth (Extended Data Fig. 8a). 
Notably, one Tbxt4"*4°"* pup that died exhibited neural-tube-closure 
defects similar to the spina bifida condition in humans (Extended Data 
Fig. 8b). Moreover, a Tbxt4""* pup that died after birth also exhibited 
neural-tube-closure defects (Extended Data Fig. 8c). Together, these 
results indicate that the expression of the Tbxt*”°”’ isoform may induce 
neural tube defects. 

The Tbxt**°"* transcript may lead to the production of a shortened 
transcription factor with limited interactions with other factors or one 
that exhibits additional functional interactions. To begin to study the 
effect of this isoform on known Tbxt target genes”, we analysed the 
transcriptomes of differentiated mouse ES cell lines from the wild-type, 
Tx tisASAvinsasay Thx taexons/* and Thx t4exon6/Aexone genotypes (Extended 
Data Fig. 9 and Supplementary Data 5). Gene expression of Tbxt targets 
varied across mouse ES cell lines exhibiting different ratios of long and 
short Tbxt isoforms (Extended Data Fig. 9), which indicated a compli- 
cated gene regulation network. Additional work will be required to 
address the possibility that the combination of the two Tbxtisoforms 
leads to new regulatory functionality. 


Discussion 


We presented evidence for a plausible evolutionary scenario for tail-loss 
evolution in hominoids, which involves the insertion of an AluY element 
into anintron of TBXT. As opposed to directly interfering witha splice 
site, we showed that this element interacts with a simian-shared AluSx1 
element in the neighbouring intron, leading to a hominoid-specific 


AS isoform of TBXT (Fig. 1). Experimental deletion of either AluY or 
its interaction partner AluSx1 eliminated this TBXT AS in differenti- 
ated human ES cells (Fig. 2). When we engineered the mouse Tbxt gene 
with the human 7BXT gene structure by inserting the AluSx1—-AluY 
pair—as well as Alu-independent inverted RCSs in a separate mouse ES 
cellline—we confirmed production of the same exon-skipped splicing 
isoform (Fig. 4). 

The AS mediated by Alu pairing in 7BXT demonstrates how an inter- 
action between intronic transposable elements can substantially 
modulate gene function to affect a complex trait. The human genome 
contains around 1.8 million copies of short interspersed nuclear 
elements—including about 1 million Alu elements—of which more than 
60% are intronic”. Systematically searching for such interactions may 
lead to the identification of additional functional roles by which these 
elements affect human development and disease. Notably, inverted Alu 
pairs can facilitate the biogenesis of exonic circular RNAs (circRNAs) 
through ‘backsplicing’“?*. Thus, it is an interesting possibility that the 
interactions between paired transposable elements might create both 
functional splice variants and circRNA isoforms from the same genetic 
locus*’. Furthermore, our results demonstrated that acompletely dif- 
ferent (non-Alu) inverted repeat sequence in the introns flanking an 
exon may also lead to exon skipping. Thus, a global search for such 
sequence configurations might reveal additional instances of exon 
skipping caused by this type of sequence configuration. 

The main results of our mouse work demonstrated a correspondence 
between the relative abundance of Tbxt isoforms and tail-length phe- 
notypes (Fig. 4g). Expression of the Tbxt*”°” transcript in mice—along 
with the full-length transcript—was sufficient to induce shorter tail to 
no-tail phenotypes (Fig. 3). Moreover, Tbxt AS induced by the intronic 
RCS pair stably modulated tail length (Thxti®7/"°? mice; Fig. 4). 
Finally, we showed that a compound heterozygote with an increased 
relative abundance of the Tox“ transcript (Thxti?®©74"6 mice) 
stably exhibits a no-tail phenotype (Fig. 4f,g and Table 2). 

Previous studies have shown that the peptide encoded by the exon 6 
sequence constitutes part of the transcription regulation domains, 
but not the DNA-binding domain’ (Extended Data Fig. 3e,f). Thus, the 
AS-induced TBXT*”°” transcript may encode for a transcription fac- 
tor with altered transcription regulation function. Indeed, our tran- 
scriptomics analyses of in vitro differentiated mouse ES cells across 
genotypes found that cells expressing both Tbxtisoforms have distinct 
transcriptome features compared with wild-type cells or cells with 
Tbxt®°"* homozygous deletion. Notably, this AluY insertion-induced 
TBXT*°”’ isoform is different from previously reported mutants 
of this gene*®”*?%, Future work is required to reveal the detailed 
DNA-binding pattern and the transcription regulation functions that 
the TBXT(Aexon6) isoform protein may play in mediating mesoderm 
initiation and tail-loss development. 

These results support an inference of how our hominoid ancestors 
evolved the loss of the tail. In this scenario, AluY insertion either induced 
the shortening or partial loss of the tail in early hominoid ancestors. 
However, even if the AluY insertion substantially influenced tail-loss 
evolution in hominoids, additional genetic changes may have acted to 
stabilize the no-tail phenotype (Extended Data Fig. 10). Such possible 
hominoid-specific variants in tail-development-related genes (such as 
those presented in Supplementary Data 1-4) may have preexisted in 
the ancestral genome or occurred after the AluY insertion. Such a pos- 
sible set of genetic events suggest that a change to the AluY elementin 
modern hominoids would be unlikely to result in the reappearance of 
the tail. Moreover, tail loss or reduction occurred independently multi- 
ple times throughout primate evolution, including in loris (Lorisidae), 
mandrill (Mandrillus) and some species of macaques (Macaca). As the 
genome sequences of anincreasing number of primates becomes avail- 
able“, it will be interesting to study aspects of convergent evolution 
involved in the diverse genetic mechanisms that mediated tail-loss 
evolution. 
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The specific evolutionary pressures relating to the loss of the tail 
in hominoids are not clear, although they are probably involved in 
enhanced locomotion in the transition to a non-arboreal lifestyle. 
We suggest, however, that the selective advantage must have been 
strong because the loss of the tail may have included an evolution- 
ary trade-off of neural tube defects, as demonstrated by the presence 
of neural-tube-closure defects in mice expressing the Thx“ tran- 
script (Extended Data Fig. 8). Notably, mutations leading to neural 
tube defects and/or sacral agenesis have been detected in the coding 
and noncoding regions of the TBXT gene” ©°. We therefore speculate 
that the evolutionary trade-off involving the loss of the tail-made 
approximately 25 million years ago—may continue to influence human 
health today. 
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Methods 


Comparative genomics analyses of tail-development-related 
genes 
Hominoid evolution represents an extended stage in primate evolu- 
tion that involved many phenotypic changes and widespread genomic 
sequence changes. Therefore, querying for hominoid-specific muta- 
tions across the genome results in tens of millions of candidates, with 
most of them disposed in non-coding regions. We used the follow- 
ing criteria to define that a candidate variant may contribute to the 
tail-loss evolution in hominoids: (1) has to be hominoid-specific, which 
means that the variant sequence or amino acid is unique to hominoid 
species and cannot be shared by any other species that have tails; 
(2) the function of the associated genes relates to tail development. 
Tail-development-related genes in vertebrates were collected from 
the MGI phenotype database and additional literature not covered by 
the MGI database. The initial analyses mainly covered genes extracted 
from the MGI term MP0003456 for ‘absent tail’ phenotype (https:// 
www.informatics.jax.org/vocab/mp_ontology/MP:0003456), toatotal 
of 31 genes. Additional analyses included genes from MP0002632 for 
‘vestigial tail’ (https://www.informatics.jax.org/vocab/mp_ontology/ 
MP:0002632) and MP0003456 for ‘short tail’ (https://www.informatics. 
jax.org/vocab/mp_ontology/MP:0000592). Together, the final list of 
genes related to vertebrate tail development included 140 genes (as of 
MGl updates in February 2023) and the mutations of which are reported 
to be related to tail-reduction phenotypes (Supplementary Data 1). 
Gene structure annotations of the 140 genes were downloaded 
from BioMart of Ensembl 109 (https://useast.ensembl.org/info/data/ 
biomart/index.html). The longest transcript with the most exons were 
selected for each gene. Multiz30way alignments of genomic sequences 
across 27 primate species were downloaded from the UCSC Genome 
Browser. We selected all six hominoid species (hg38, gorGorS, panTroS, 
panPan2, ponAbe2 and nomLeu3) to calculate a hominoid-consensus 
sequence, and used two non-hominoid species (pig-tailed macaque, 
macNeml1, and marmoset, calJac3) as the outgroups. The homologous 
regions of the 140 genes, together with 10,000 bp both upstream and 
downstream sequences, in the 8 species were extracted from Mul- 
tiz30way alignment using bedtools (v.2.30.0)”. Hominoid-specific 
variants were identified using the following parameters: SNVs or sub- 
stitutions shared by six hominoid species but different in any of the two 
outgroup monkey species were identified as putative hominoid-specific 
SNVs (Supplementary Data 2); DNA sequences present in all six homi- 
noid species but absent in either of the two outgroup monkey spe- 
cies were identified as hominoid-specific insertions (Supplementary 
Data 3); and DNA sequences absent from all six hominoid species but 
present in both of the two outgroup monkey species were identified 
as hominoid-specific deletions (Supplementary Data 4). Notably, our 
criteria for analysing hominoid-specific variants may include a small 
proportion of false-positive hits that are outgroup-specific variants. 
We used the Ensembl variant effect predictor (integrated in 
Ensembl! 109)** to infer the potential functional impact of the detected 
hominoid-specific SNVs, insertions and deletions. Owing to the lack of 
an ancestral genome as the reference sequence, variant effect predic- 
tor predictions were performed inversely using the human/hominoid 
genomic sequenceas the reference allele, and the outgroup sequence 
served as the alternative allele. SNVs annotated as either ‘deleterious’ 
(<0.05) inthe SIFT score or ‘damaging’ (>0.446) in the PolyPhen score 
(53 instances), and insertions (6 instances) or deletions (2 instances) 
that affect protein sequences were collected for further manual inspec- 
tion comparison across species. This additional inspection was per- 
formed across the Cactus Alignment of the genomes across 241 species 
in the UCSC Genome Browser Comparative Genomics module”. This 
inspection found that most of the annotated variants that may affect 
host gene function fell into three categories: (1) outgroup-specific 
variants; (2) false-positive annotation of the variant function ina minor 


transcript; and (3) missense variants in hominoid species but sharing 
the same amino acid in at least one other tailed species. These variants 
were not considered as candidates that may have affected tail-loss 
evolution in hominoids. Excluding these variants, we identified nine 
variants as true hominoid-specific coding region mutations, including 
seven SNVs and two insertions and deletions (Supplementary Data 1). 
Following identification of top candidates, protein sequence align- 
ments across representative vertebrate species were downloaded from 
the NCBI database and analysed using the MUSCLE algorithm with 
MEGA X software and default settings™. 


RNA secondary structure prediction 

RNA secondary structure prediction of the human 7BXT exon 5- 
intron 5—exon 6-intron 6-exon 7 sequence was performed using 
RNAfold (v.2.6.0) through the ViennaRNA Web Server (http://rna.tbi. 
univie.ac.at/)**. The algorithm calculates the folding probability using 
a minimum free energy matrix with default parameters. In addition, 
the calculation included the partition function and base pairing prob- 
ability matrix. Notably, human 7BXTtranscripts were annotated to have 
a5’ untranslated region exon, making its exon numbers differ from 
most of other species, including mouse. To simplify, we referred to 
the first coding exon of human 7BXTas exon Land thus the alternative 
spliced exon as exon 6, consistent with mouse Tbxt. RNA secondary 
structure prediction used the DNA sequence from exon 5 to exon 7 
following this order. 


Human ES cell culture and differentiation 

Human ES cells (WAO1, also called H1, from WiCell Research Institute) 
were authenticated by the distributor WiCell using short tandem repeat 
profiling to authenticate the cell lines. Human ES cells were cultured 
in feeder-free conditions on tissue-culture-grade plates coated with 
human ES cell-qualified Geltrex (Gibco, A1413302). Geltrex was 1:100 
diluted in DMEM/F-12 (Gibco, 11320033) supplemented with 1x Glu- 
tamax (100X, Gibco, 35050061) and 1% penicillin-streptomycin (Gibco, 
15070063). Before seeding human ES cells, the plate was treated with 
Geltrex working solution in a tissue culture incubator (37 °C and 5% 
CO,) for at least 1h. 

StemFlex medium (Gibco, A3349401) was used for human ES cell 
maintenance and culturing in a feeder-free condition according to 
the manufacturer’s protocol. In brief, StemFlex complete medium 
was made by combining StemFlex basal medium (450 ml) with 50 ml 
of StemFlex supplement (10x) plus 1% penicillin-streptomycin. Each 
Geltrex-coated well ona 6-well plate was seeded with 200,000 cells to 
obtain about 80% confluence in 3-4 days. Human ES cells were cryopre- 
served in PSC Cryomedium (Gibco, A2644601). The culture medium 
was supplemented with 1x RevitaCell (100x, Gibco, A2644501, whichis 
also included inthe PSC Cryomedium kit) when cells had gone through 
stressed conditions, such as freezing-and-thawing or nucleofection. 
RevitaCell supplemented medium was replaced with regular StemFlex 
complete medium on the second day. Human ES cells grown under the 
RevitaCell condition might become stretched but would recover after 
returning to the normal StemFlex complete medium. All human ES 
cell lines tested negative during our routine quantitative PCR-based 
mycoplasma tests. 

The human ES cell differentiation assay to induce a gene expression 
pattern of the primitive streak state was adapted froma previously pub- 
lished method**. On day -1, freshly cultured human ES cell colonies were 
dissociated into clumps (2-5 cells) using Versene buffer (with EDTA, 
Gibco, 15040066). The dissociated cells were seeded on Geltrex-coated 
6-well tissue culture plates at 25,000 cells per cm? (0.25 M per well in 
the 6-well plates) in StemFlex complete medium. Differentiation to the 
primitive streak state was initiated on the next day (day 0) by switching 
StemFlex complete medium to basal differentiation medium. Basal 
differentiation medium (50 ml) was made using 48.5 ml DMEM/F-12, 
1% Glutamax (500 pl), 1% ITS-G (500 ul, Gibco, 41400045) and 
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1% penicillin-streptomycin (S00 ul), and supplemented with 3 uM 
GSK3 inhibitor CHIR99021 (10 pl of 15 mM stock solution in DMSO; 
Tocris, 4423). The cells were collected at differentiation day 1 to 3 for 
downstream experiments, which confirmed the expression fluctua- 
tions of mesoderm genes (7TBXT and MIXL1) in a 3-day differentiation 
period® (Extended Data Fig. 3). 


Mouse ES cell culture and differentiation 

The mouse ES cell line (MK6) derived from the C57BL/6J mouse 
strain was obtained from the NYU Langone Health Rodent Genetic 
Engineering Laboratory. The wild-type MK6 mouse ES cell line was 
authenticated by its competence for contributing to embryos when 
cultured on feeder-cell-dependent conditions followed by blastomere 
injection. MK6 mouse ES cells used in this study were cultured in both 
feeder-dependent and feeder-free culture conditions depending on 
the purposes of the experiment. All mouse ES cell lines tested nega- 
tive during our routine quantitative PCR-based mycoplasma tests. For 
feeder-dependent mouse ES cell culture conditions, mouse ES cells 
were plated ona pre-seeded monolayer of mouse embryonic fibroblast 
(MEF) cells (CellBiolabs, CBA-310). MEF-coated plates were prepared by 
seeding 50,000 cells per cm’ in tissue culture plates treated with 0.1% 
gelatin solution (EMD Millipore, ES-O06-B). MEF culturing medium 
was made from DMEM (Gibco, 11965118) with 10% FBS (GeminiBio, 
100-500), 0.1 mM MEM non-essential amino acids (Gibco, 11140050), 
1x Glutamax (Gibco, 35050061) and 1% penicillin—streptomycin (Gibco, 
15070063). Mouse ES cell medium was made from Knockout DMEM 
(Gibco, 10829018) containing 15% (v/v) FBS (Hyclone, SH30070.03), 
0.1mM B-mercaptoethanol (Gibco, 31350010), 1x MEM non-essential 
amino acids (Gibco, 11140050), 1x Glutamax (Gibco, 35050061), 
1x nucleosides (Millipore, ES-O08-D) and 1,000 units ml™ LIF (EMD 
Millipore, ESG1107). 

For feeder-free mouse ES cell culture conditions, cells were grown 
on tissue-culture-grade plates that were pre-coated with mouse ES 
cell-qualified 0.1% gelatin (EMD Millipore, ES-006-B) at room tem- 
perature for at least 30 min. Before seeding mouse ES cells, feeder-free 
mouse ES cell culturing medium was added toa gelatin-treated plate and 
warmed ina37 °C and 5% CO, incubator for at least 30 min. Feeder-free 
mouse ES cell culturing medium, also called ‘80/20’ medium, com- 
prises 80% 2i medium and 20% of the above-mentioned mouse ES cell 
medium by volume. The 2i medium was made froma 1:1 mix of Advanced 
DMEM/F-12 (Gibco, 12634010) and Neurobasal-A (Gibco, 10888022), 
followed by adding 1x N2 supplement (Gibco, 17502048), 1x B-27 sup- 
plement (Gibco, 17504044), 1x Glutamax (Gibco, 35050061), 0.1 mM 
B-mercaptoethanol (Gibco, 31350010), 1,000 units mI‘ LIF (Millipore, 
ESG1107), 1 uM MEK1/2 inhibitor (Stemgent, PD0325901) and 3 uM 
GSK3 inhibitor CHIR99021 (Tocris, 4423). 

The mouse ES cell differentiation protocol for inducing Tbxt gene 
expression was adapted from a previously described method in a 
feeder cell-free condition”. Cells were first plated in 80/20 medium 
for 24 h on a gelatin-coated 6-well plate, followed by switching to 
N2/B27 medium without LIF or 2i for another 2 days of culture. The 
N2/B27 medium (50 ml) was made with 18 ml Advanced DMEM/F-12, 
18 ml Neurobasal-A, 9 ml Knockout DMEM, 2.5 ml Knockout Serum 
Replacement (Gibco, 10828028), 0.5 ml N2 supplement, 1 ml B27 sup- 
plement, 0.5 ml Glutamax (100), 0.5 ml nucleosides (100x) and 0.1 mM 
B-mercaptoethanol. Then the N2/B27 medium was supplemented with 
3 uM GSK3 inhibitor CHIR99021 to induce differentiation (day 0). The 
cells were collected at differentiation day—3 for downstream experi- 
ments, which showed consistent results of Tbxt gene expression fluc- 
tuations in a 3-day differentiation period. 


CRISPR targeting 

All guide RNAs of the CRISPR experiments were designed using 
the CRISPOR algorithm integrated in the UCSC Genome Browser. 
Guide RNAs were cloned into the pX459V2.0-HypaCas9 plasmid 


(AddGene, plasmid 62988) or its custom derivative by replacing the 
puromycin-resistance gene with the blasticidin-resistance gene. Guide 
RNAs in this study were designed in pairs to delete the intervening 
sequences. Insertion sites for the AluSx1 and AluY pair in mouse Tbxt 
(Tbxt'"*””) were selected by the guide RNA quality and the relative 
distance compared to the human 7BXT gene structure. The insertion 
site for the RCS element (7bxti""©) was the same as for insertion of the 
AluY element. The CRISPR-targeting sites and guide RNA sequences 
are listed in Supplementary Table 2. 

All oligonucleotides (plus Golden-Gate assembly overhangs) were 
synthesized by Integrated DNA Technologies (IDT) and ligated into an 
empty pX459V2.0 vector following the standard Golden Gate Assembly 
protocol using BbsI restriction enzyme (NEB, R3539). The constructed 
plasmids were purified from 3 ml Escherichia coli cultures using a ZR 
Plasmid MiniPrep Purification kit (Zymo Research, D4.015) for sequence 
verification. Plasmids for delivering into ES cells were purified from 
250 ml F. coli cultures using a PureLink HiPure Plasmid Midiprep kit 
(Invitrogen, K210005). To facilitate DNA delivery to ES cells through 
nucleofection, the purified plasmids were resolved in Tris-EDTA buffer 
(pH 7.5) toa concentration of at least 1 pg pl ina sterile hood. 


DNA delivery 

DNA delivery into human or mouse ES cells for CRISPR-Cas9 targeting 
was performed using a Nucleofector 2b device (Lonza, BioAAB-1001). 
A Human Stem Cell Nucleofector kit 1(VPH-5012) and a mouse ES cell 
Nucleofector kit (Lonza, VVPH-1001) were used for delivering DNA 
into human and mouse ES cells, respectively. ES cells were double-fed 
the day before the nucleofection experiment to maintain a superior 
condition. 

Before performing nucleofection on human ES cells, 6-cm tissue 
culture plates were treated with 0.5 pg cm“ rLaminin-521ina37 °C and 
5% CO, incubator for at least 2h. rLaminin-521-treated plates provide 
the best viability when seeding human ES cells as single cells. Cultured 
human ES cells were washed with DPBS and dissociated into single cells 
using TrypLE Select Enzyme (no phenol red; Gibco, 12563011). One 
million human ES cell single cells were nucleofected using program 
A-023 according to the manufacturer’s instructions for the Nucleofec- 
tor 2b device. Transfected cells were transferred onto the rLaminin- 
521-treated 6-cm plates with pre-warmed StemFlex complete medium 
supplemented with 1x RevitaCell but not penicillin-streptomycin. 
Antibiotic selection was performed 24 h after nucleofection with puro- 
mycin (final concentration of 0.8 pg ml; Gibco, A1113802). 

Mouse ES cells were dissociated into single cells using StemPro 
Accutase (Gibco, A1110501), and 5 million cells were transfected 
using program A-023 according to the manufacturer’s instructions. 
Exon 6 deletion in mouse ES cells was performed using cells cultured 
in the feeder-free condition. Nucleofected cells were plated on 0.1% 
gelatin-treated 10-cm plates, followed by antibiotic selection 24 h 
after nucleofection with blasticidin (final concentration of 7.5 pg ml"; 
Gibco, A1113903). The insertion of the AluSx1-AluY pair and insRCS 
engineering were performed using mouse ES cells cultured ona 
feeder-dependent condition. Mouse ES cells were plated on a mon- 
olayer of MEF cells seeded on 0.1% gelatin-treated 10-cm plates, fol- 
lowed by antibiotic selection 24 h after nucleofection. 

Together with the pX459V2.0-HypaCas9-gRNA plasmids for nucleo- 
fection, single-strand DNA oligonucleotides were co-delivered for 
microhomology-induced deletion of the targeted sequences. These 
ssDNA sequences were synthesized by IDT through its Ultramer DNA 
Oligo service, including three phosphorothioate bond modifications 
on each end. Detailed sequence information of these long ssDNA oli- 
gonucleotides are listed in Supplementary Table 3. 

For Tbxt"““" and Tbxt'""° engineering, homology-based repair- 
ing template plasmids, including a selection marker gene puro-ATK, 
(puromycin-resistance gene for positive selection and ATK, a truncated 
version of herpes simplex virus type 1 thymidine kinase, for negative 


selection, as presented in Extended Data Fig. 7), was transfected 
together with the pX459V2.0-HypaCas9-gRNA plasmids. Following 
nucleofection and antibiotic selection (0.8 pg ml! puromycin for3 days 
starting from the second day of nucleofection), single clones were 
picked, followed by PCR genotyping of CRISPR-Cas9-targeted loci 
(exon 6 deletion, inserting AluY, inserting AluSx1 or inserting RCS). The 
genotyping PCR primers are listed in Supplementary Table 4. 

PCR genotyping-confirmed clones were further validated using 
Capture-seq (see below) to confirm the genotype and to exclude 
the possibility of any random integration of plasmid DNA. Subse- 
quently, Cre recombinase was transiently introduced to remove the 
selection marker puro-ATK. Cells were treated with 250 nM ganciclo- 
vir for counter-selecting ATK-negative cells as the selection marker 
gene-depleted cells. Following isolation of single mouse ES cell clones 
of Tbxt'"” and Tbxt'"* mouse ES cells without the selection marker 
gene, these clones were used for downstream experiments, including 
in vitro differentiation assays and blastocyst injection for generating 
mouse models. 


Capture-seq genotyping 

Capture-seq, or targeted sequencing of the loci of interest, was per- 
formed as previously described*’. Conceptually, Capture-seq uses 
custom biotinylated probes to pull-down the sequences at genomic 
loci of interest from the standard whole-genome sequencing libraries, 
thereby enabling sequencing of the specific genomic lociin a much 
higher depth while reducing the cost. 

Genomic DNA was purified from mouse ES cells or from ear punches 
of mice of interest using a Zymo Quick-DNA Miniprep Plus kit (D4068) 
according to the manufacturer’s protocol. DNA sequencing librar- 
ies compatible for Illumina sequencers were prepared following the 
standard protocol. In brief, 1 pg of gDNA was sheared to 500-900 base 
pairs in a 96-well microplate using a Covaris LE220 (450 W, 10% duty 
factor, 200 cycles per burst and 90-s treatment time), followed by 
purification with a DNA Clean and Concentrate-5 kit (Zymo Research, 
D4013). Sheared and purified DNA were then treated with end repair 
enzyme mix (T4 DNA polymerase, Klenow DNA polymerase and T4 
polynucleotide kinase, NEB, M0203, M0210 and MO201, respectively), 
and A-tailed using Klenow 3’—5’ exo-enzyme (NEB, M0212). Illumina 
sequencing library adapters were subsequently ligated to DNA ends 
followed by PCR amplification with KAPA 2X Hi-Fi Hotstart Readymix 
(Roche, KRO370). 

Custom biotinylated probes were prepared as bait through nick 
translation using BAC DNA and/or plasmids as the template. The probes 
were prepared to comprehensively cover the entire locus. We used BAC 
lines RP24-88H3 and RP23-159G7, purchased from BACPAC Genomics, 
to generate bait probes covering the mouse 7bxt locus and about 200 kb 
flanking sequences in both upstream and downstream regions. The 
pooled whole-genome sequencing libraries were hybridized with the 
biotinylated baits in solution and purified using streptavidin-coated 
magnetic beads. Following pull-down, DNA sequencing libraries were 
quantified using a Qubit 3.0 Fluorometer (Invitrogen, Q33216) with 
a dsDNA HS Assay kit (Invitrogen, Q32851). The sequencing libraries 
were subsequently sequenced onan Illumina NextSeq 500 sequencer 
in paired-end mode. 

Sequencing results were demultiplexed using Illumina bcl2fastq 
(v.2.20), requiring a perfect match to indexing BC sequences. 
Low-quality reads or bases and Illumina adapter sequences were 
trimmed using Trimmomatic (v.0.39). Reads were then mapped tothe 
mouse genome (mm10) using bwa (v.0.7.17). The coverage and muta- 
tions in and around the Tbxt locus were checked through visualization 
ina mirror version of the UCSC Genome Browser. 


Mouse experiments and generating Tbxt**°"* mice 
All mouse experiments were performed following NYULH’s animal 
protocol guidelines and performed at the NYU Langone Health Rodent 


Genetic Engineering Laboratory. Mice were housed in the NYU Langone 
Health BSL1 barrier facility ina12-h light to 12-h dark cycle, with ambient 
temperature and humidity conditions. All experimental procedures 
were approved by the Institutional Animal Care and Use Committee at 
NYU Langone Health. Wild-type C57BL/6] (strain 000664) mice were 
obtained from The Jackson Laboratory. 

The Tbxt**°"** heterozygous mouse model was generated through 
zygotic microinjection of CRISPR reagents into wild-type C57BL/6J 
zygotes (Jackson Laboratory strain 000664), adapting a previously 
published protocol’. In brief, Cas9 mRNA (MilliporeSigma, CAS9M- 
RNA), synthetic guide RNAs and single-stranded DNA oligonucleotide 
were co-injected into 1-cell stage zygotes following the described pro- 
cedures”. Synthetic guide RNAs were ordered from Synthego as their 
custom CRISPRevolution sgRNA EZ kit, with the same targeting sites as 
used in the CRISPR deletion experiment of mouse ES cells (AUUUCGGU 
UCUGCAGACCGG and CAAGAUGCUGGUUGAACCAG). The co-injected 
single-stranded DNA oligonucleotide is the same as described above. 
Injected embryos were then in vitro cultured to the blastomeric stage, 
followed by embryo transferring to the pseudopregnant foster moth- 
ers. Following zygotic microinjection and transferring, founder pups 
were screened based on their abnormal tail phenotypes. DNA samples 
were collected through ear punches at about day 21 for PCR genotyping 
and Capture-seq validation to exclude off-targeting at the Tbxt locus. 

After confirming the genotype, Tox” founder mice were back- 
crossed with wild-type C57B/6) mice for generating heterozygous F, 
mice. Owing to the varied tail phenotypes, intercrossing between F, 
heterozygotes were performed in two categories: type 1 intercross- 
ing included at least one parent having no tail or a short tail, whereas 
type 2 intercrossing were mated between two long-tailed F, heterozy- 
gotes (Table 1). As summarized in Table 1, both types of intercross- 
ing produced heterogeneous tail phenotypes in F, Tbxt**°" mice, 
thereby confirming the incomplete penetrance of tail phenotypes and 
the absence of homozygotes (Tbxt4%°"42"*), Adult mice (>12 weeks) 
were anaesthetized for X-ray imaging of vertebra using a Bruker In-Vivo 
Xtreme IVIS imaging system. To confirm the embryonic phenotypesin 
homozygotes, embryos were dissected at E11.5 gestation stage from 
the timed pregnant mice using a standard protocol. 


Generating Tbxt"*”” and Tbxt"**~ mice 

The engineered Tbxt'"” and Tbxt'"* mouse ES cells were injected into 
either C57BL/6J-albino (Charles River Laboratories, strain 493) blasto- 
cysts for chimeric F, founder mice or injected into B6D2F1/J (aF, hybrid 
between C57BL/6] female and DBA/2J male, Jackson Laboratory strain 
100006) tetraploid blastocysts for homozygote F, founder mice pro- 
duction. The tetraploid complementation strategy aimed to generate 
homozygous mice with the proposed genotype inthe F, generation”. 
Through multiple trials of injection using both mouse ES cell lines, we 
achieved only one Tbxti*4/"s44V F. founder mouse (male) but none for 
the Tbxt'"*S mouse. However, during genotype screening for Thxt*"* 
founder mice, we serendipitously identified a male grey mouse that 
incorporated a heterozygous insertion in intron 5. Genotype analysis 
revealed that the insertion was a 220 bp DNA sequence from intron 6 of 
Tbxt (chromosome 17: 8439335-8439554, mm10), inserted ina reverse 
complementary scenario into intron 5 at a designed CRISPR targeting 
site (chromosome 17: 8438386, mm10). The inserted sequence insRCS2 
inintron 5 therefore forms a 220 bp inverted complementary sequence 
pair with its original sequence in intron 6 (chromosome 17: 8439335- 
8439554, mm10), resembling the designed Tbxt'"*© and Tbxt'""” gene 
structures. This genotype was therefore called Tbxt'"*“’. Capture-seq 
genotyping of this Tbxt'**“?* mouse confirmed that the Thxti"*®~ 
allele is in the CS7BL/6 background, whereas the wild-type Tbxt locus 
of the Tbxt"*®“" founder mouse is from the DBA/2J background. This 
Tbxt'"®?* mouse was therefore backcrossed to C57BL/6 wild-type mice 
and further intercrossed between F, heterozygotes to produce homozy- 
gotes (Thxtis®S7/nskCS2) in the F, generation. Capture-seq analysis of 
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Tx tirs8S2/nskCS2 mice confirmed their C57BL/6 background at the Tbxt 
locus (Extended Data Fig. 8). We also compared the tail phenotypesin 
age-matched C57BL/6 and DBA/2J mice and found no difference (data 
not shown), which suggested that any genetic background difference 
between the twostrains does not affect tail length. The Tbxt'"*“’ mice 
(both heterozygotes and homozygotes) were therefore used for the 
analysis of tail phenotypes. 

The Tbxt"4” and Tbxt'"®*“? founder mice, both male, were separately 
backcrossed to wild-type C57B/6) mice for generating heterozygous F, 
pups, followed by intercrossing between F, heterozygotes to generate 
homozygotes in F, generation. With all genotypes available, mouse tail 
lengths were measured monthly across genotypes and sex groups. 
Additionally, two types of breading pairs, Thxt"®©”* x Thxt4*°"* and 
Tx tirsRO2/SRCS2 x Thx f4%"5/* were performed across different founder 
lines of Thxt*”°" mice to analyse tail phenotypes in their offspring. 
These results are summarized and presented in Table 2. 

To analyse the isoform expression patterns of mouse Tbxt in the 
embryonic tailbud region, wild-type, heterozygote and homozygote 
embryos from intercrossing experiments (Tbxti®"* x Thxti?*O"*, 
Toxtit* x Thxti"54) were dissected at the E10.5 gestation stage. 
The tailbud for each embryo was collected for isolating total RNA, and 
together with embryonic tissue for gDNA to be used for genotyping. 
These results are presented in Fig. 4e. 


Splicing isoform detection 

Total RNA was collected from undifferentiated and differentiated 
cells of both human and mouse ES cells, and from embryonic tail- 
bud samples, using a standard column-based purification kit (Qia- 
gen RNeasy Kit, 74004). DNase treatment was applied during RNA 
extraction to remove any potential DNA contamination. Following 
extraction, RNA quality was assessed through electrophoresis based 
onribosomal RNA integrity. Reverse transcription was performed using 
1 pg of high-quality total RNA for each sample and a High-Capacity 
RNA-to-cDNA kit (Applied Biosystems, 4387406). DNA oligonucleo- 
tides used for RT-PCR and/or quantitative RT-PCR are listed in Sup- 
plementary Table 1. 


Transcriptomics analyses in differentiated mouse ES cells 
Total RNA samples isolated from day-1in vitro-differentiated mouse ES 
celllines across wild-type, Thxtis54/ns454Y, Th x-p4r5/* and Thx tirons/sexon6 
genotypes were used for bulk RNA sequencing analysis. RNA samples 
were prepared using a standard column-based purification kit (Qia- 
gen RNeasy kit, 74004). Two biological replicates were prepared for 
each mouse ES cell genotype, with the two Thxt4#"4/4°"> mouse ES 
cell samples coming from different clones. RNA sequencing libraries 
were prepared using a NEBNext Ultra II Directional RNA Library Prep kit 
(NEB, E7765L) through its polyA mRNA sequencing workflow by using 
the NEBNext Poly(A) mRNA Magnetic Isolation Module (NEB, E7490L). 
Raw sequencing reads were mapped to the mouse genome (mm10) 
with STAR (v.2.7.2a) aligner®. The resultant strand-specific read counts 
of all samples were integrated into a matrix for downstream analysis. 
Differentially expressed genes were detected using DESeq2 (v.1.40.2)©, 
using its default two-sided Wald test with the cut-off of log,(fold expres- 
sion change) > 0.5 and multiple test-adjusted P value < 0.05. Thetop 500 
variable genes from DESeq?2 across all samples were used to perform 
principal component analysis. The Tbxt target genes were obtained 
froma previous publication”, defined by significant Tbxt ChIP-seq 
peak signals detected in in vitro-differentiated mouse ES cells. The set 
of Tbxt target genes was intersected with the significant differentially 
expressed genes identified in each mutant sample compared withthe 
wild-type controls, and these were aggregated to generate the overall 
set of differentially expressed Tbxt target genes across the analysed 


mouse ES cell lines. These differentially expressed Tbxt target genes 
were visualized using a heatmap, with the log, .-transformed normalized 
transcript matrix followed byz score standardization across samples. 


Reporting summary 
Further information on research design is available in the Nature Port- 
folio Reporting Summary linked to this article. 
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Raw and processed sequencing data in the manuscript have been 
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from GitHub (https://github.com/boxialaboratory/Tail-Loss-Primates). 
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Extended Data Fig. 2| RNAstructure prediction. a, Predicted RNA secondary 
structure of the TBXT exon5-to-exon7 sequence using the RNAfold algorithm 
of the ViennaRNA package”. The paired AluY-AluSx1 region was highlighted. 
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Extended Data Fig. 3| Analyses of TBXT isoforms. a, In vitro differentiation of 
human and mouse ESCs for inducing 7BX7T/Tbxt expression. Human and mouse 
ESCs differentiation assay was adapted from Xiet al. (2017)*° and Pour et al. 
(2019)?’, respectively. Schematic adapted from icons created by Marcel Tisch via 
bioicons.com. b, Quantitative RT-PCR (RT-qPCR) of 7BXTand MIXL1 expression 
during hESC differentiation, indicating correct induction of mesodermal gene 
expression program’’. c, Quantitative RT-PCR of Tbxt expression during mESC 
differentiation. Datain b and c were presented as mean +/- standard deviation 
of the relative gene expression levels. Sample number n=3 represents RT-qPCR 
results from three biologically independent RNA samples, with each data point 


averaged from 3 technical replicates in quantitative PCR. d, RT-PCR of TBXT/Tbxt 
transcripts in human and mouse differentiated ESCs, highlighting the Aexon6 
splicing isoform unique to human. RT-PCR results were presented as biological 
duplicates. e, Protein sequence alignment of TBXT-exon6 region inthe 
representative mammals. All presented animals have tails except human and 
chimpanzee. f, The exon 6-coded peptide of Tbxt protein overlaps with large 
fractions of two transcription regulation domains. TA, transcription activation; 
TR, transcription repression. Functional domain annotation of mouse Tbxt 
protein was adapted from Kispert et al. (1995)”. 
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Extended Data Fig. 4 | Validation of Alu-deletion hESC clones and the respectively. Each genotype included two independent clones of AluY deletion 
expressed 7BXT isoforms. a, PCR validation of the hESC clones with deletions or AluSx1 deletion, corresponding to the twoclones presented in Fig. 2b. 

of AluY or AluSx1in TBXT. PCR validation for each sample were performed in b, Sanger sequencing of the TBXT**°”’ and TBXT*""*” transcripts detected in 
pairs, each amplifying both AluSx1 locus (Sx1) and the AluY locus (Y) using Fig. 2b. The sequencing results were aligned to the 7BX7T fulllength mRNA 


primers that bind the two flanking sequences of the targeted region, sequence. 
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Extended Data Fig. 5 | See next page for caption. 
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Extended Data Fig. 5| Thxt“*°"” founder mice generated through zygotic 
CRISPR/Cas9 targeting approach. a, Schematic of zygotic injection of CRISPR/ 
Cas9 reactions. b, Two more Tbxt**°"™ founder mice (in addition to the one 
shown in Fig. 3d) indicating an absence or reduced form of the tail. c, Sanger 
sequencing of the exon 6-deleted allele isolated from the genomic DNA of 
Tbxt*°"** founder mice. Founder 1 had an unexpected insertion of 23 base 
pairs at the CRISPR cutting site in the original intron 5 of Tbxt. Both founder 2 
and 3 had the exact fusion between the two CRISPR cutting sites inintrons 5 


and 6. d-e, Capture-seq analyses at the Tbxtlocus of founder mice did not detect 
off-target mutations. A zoomed-in view of the Capture-seq results at the 7bxt 
locus highlights the CRISPR-mediated exon 6 deletion (e). Capture-seq baits 
were generated from bacterial artificial chromosomes (RP24-88H3 and RP23- 
159G7). The shallow-covered regions are typically repeat sequences inthe 
mouse genome andare consistent across samples. Control DNAs were obtained 
from wild-type or Thxt*®°"4*""> mESCs, and the heterozygous sample came 
froma1:1 mixture of genomic DNA from wild-type and Thxt4°"4"°"> mESCs. 
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Extended Data Fig. 6 | Engineering of inverted sequence pairs in mouse Tbxt 


induces alternative splicing. a—b, Schematics of mouse 7bxt gene structure 
withinserted human AluSx1-AluyY pair (a, Tbxt-insASAY) or a designed intronic 
reverse complementary sequence of 297 bp (b, Tbxt-insRCS). The designed 
RCS insertion has the same length as AluY in human 7BXT. In both designs, a 


two-step experimental procedure was adapted by first integrating the target 


elements witha selection cassette of puromycin-resistance and truncated 
thymidine kinase (puro-ATK) gene into the intron of mouse 7bxt, followed by 


b Inserting a reverse complementary sequence (RCS) in mMESC: 
RCS 


ME HER 


exon1&2 exon3 exon4&5 exon6 exon7 exon’ 


Schematic of experimental procedure: 


Se arene enon Numeeerccaee_ eeduveeee ccemeeeeeneet 


LoxP LoxP = S ) 


| Nucleofection + puro-selection 


= 
LoxP LoxP = 
TbxtinsRes: | Transient expressing Cre 
“ee 
a + | - Doze -- - rl 


RCS 


RCS pair: 


Genetic edited mESCs 


Uterus 


transfer 
© 


Zygote Diploid or Tetraploid 
blastocyst 


removal of the selection cassette through transiently expressing Cre 
recombinase (Methods). c, 7bxt transcripts detected through RT-PCR using 
differentiated mouse ESC lines across wild-type (left), homozygous Tbxt-insASAY 
(ThxtinsasavinsasaY middle), and homozygous Tbxt-insRCS (Tbxtirs®/s®CS, right) 
genotypes. RT-PCR results were presented as biological duplicates for each 
genotype. d, mESC injection into diploid or tetraploid blastocyst for generating 
Thx tinsASAViinsASAY and Thx tinsRes/insrcs mouse models. 
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Extended Data Fig. 7| Validation of Thxti54”/7454Y and Thxtirs?52/nskcs2 insertion, thus resembling an inversion event at the insertion site (b) due to 
homozygous mice. a-b, Capture-seqreads mappedtomousereferencegenome _forced mapping of the sequencing reads to reference genome. c, Sanger 
mm10 at the full Tbxt locus (a) and a zoom-in view of Tbxt gene region (b) in sequencing of Tbxtins®52/imskCS2 5 en gmic sequence confirmed the inserted 
ThxtirsAsavinsASAY af ThxtinsROS2/inskCS2 homozygous mice. Black ticks under each sequence and the exact insertion site. The inserted sequence constitutes a 
coverage track indicate the detected SNVs referring to the mm10 genome (a). 220 bp sequence from Tbxt-intron 6 (mm10 chr17: 8439335-8439554). d, Sanger 
The black bars in (b) indicate the detection of reads supporting an inversion. sequencing of RT-PCR results using total RNA extracted from tailbud of 


As expected, Thxtin?$7/"s®¢S? samples incorporated an intronic sequence ThxtirsS2/nskCS? embryo at stage E10.5. The results correspond to Fig. 4e. 


Tb. xt'* Tb x fexon6/Aexon6 1 ) Tb x fAexon6/Aexon6 (#2 ) Tb x f4exon6/Aexoné Tb x fAexon6/+ 


Extended Data Fig. 8 | Exon6 deletion of Tbxt may lead to neural tube defects 
in mouse. a, Analyzing E11.5 Tbxt4*"*4°"* mouse embryos obtained through 
intercrossing Tbxt4°"** mice. Thxt*°°"4"> embryos either developed neural 
tube closure defects (middle) that died at birth or arrested at approximately 
stage E9 during development (right). Red and black dashed lines mark the 


embryonic tail regions and limb buds, respectively. Green arrowheads indicate 
malformed spinal cord regions. b-c, Both Thxt4°"64"° (b) and Tbxt4°"*"* (c) 
neonatal mice may present neural tube closure defects during embryonic 
development. The presented embryos were the only two cases found in this 
study that died after birth with neural tube closure defects. 
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Extended Data Fig. 9|RNA-seq analysis of Tbxt target genes in differentiated 
mESC lines across genotypes. The analyzed mESC lines include wild-type, 
Txt itsAsavinsasay’ Ty pAexonsl* and Thxt4r4"> genotypes, each in duplicates. 

a, Scatter plot of the samples using the first two principal component (PC) 
coordinates in principal component analysis. b, Heatmap of Tbxt-target genes 
that were differentially expressed across the analyzed samples. Tbxt-target 
gene list was obtained from Tbxt ChIP-seq results using in vitro differentiated 


mESCs*. Functionally characterized Tbxt-target genes were labeled onthe 
y-axis of the heatmap. c-e, Volcano plots of differentially expressed (DE) genes 
comparing mutant mESCs with the wild-type mESC. DE genes were identified 
using DESeq2 (version 1.40.2) through its default two-sided Wald test anda 
cutoff of log2 fold expression change >0.5 and multiple test-adjusted p value 
(p.adj) <0.05. For each plot, DE Tbxt-target genes were highlighted in red, and 
the top DE genes among this group were labeled. 
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Extended Data Fig. 10| A model for tail-loss evolutionintheearlyhominoids. changes - pre-existing in the ancestral genome or occurring after AluY 
The AluY insertion in T7BXT may have marked akey geneticeventthatcontributed —_insertion- may have also acted to promote or stabilize the no-tail phenotype 
to tail-loss evolution in the hominoid common ancestor. Additional genetic inthe early hominoids. 
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Statistics 


For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section. 


n/a | Confirmed 


The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement 


A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly 


The statistical test(s) used AND whether they are one- or two-sided 
Only common tests should be described solely by name; describe more complex techniques in the Methods section. 


A description of all covariates tested 


A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons 


A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) 
AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) 


For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted 
Give P values as exact values whenever suitable. 


For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings 


For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes 


Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated 


Our web collection on statistics for biologists contains articles on many of the points above. 
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Data collection — Tail development-related genes in vertebrates were collected from the MGI mouse phenotype database. The gene structure annotations of 
the 140 genes were downloaded from BioMart of Ensembl! 109. Multiz30way alignments of genomic sequences across 27 primate species 
were downloaded from the UCSC Genome Browser, referring to hg38. The homologous regions of the tail-development genes and their 10Kb 
upstream and downstream sequences were extracted from Multiz30way alignment using bedtools (v2.30.0). The Tbxt-target genes were 
collected from Lolas et al. 2014 (citation 41). All software information were described in the Methods section. 


Data analysis We used custom analysis pipeline to extract the primate multiple sequence alignment files at the genomic regions of the designated genes. 
We used the Variant Effect Predictor (VEP), integrated in Ensembl 109, to infer the potential functional impact. Protein sequence alignment 
were done by the MUSCLE algorithm using MEGA X software with default settings. RNA secondary structure prediction was performed using 
RNAfold (version 2.6.0) through the ViennaRNA Web Server (http://rna.tbi.univie.ac.at/). All data analysis using public softwares and/or 
recourses were described in the Methods section. The relevant code and processed data for this manuscript is available on GitHub (https:// 
github.com/boxialaboratory/Tail-Loss-Primates) 


For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors and 
reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Portfolio guidelines for submitting code & software for further information. 
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All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: 


- Accession codes, unique identifiers, or web links for publicly available datasets 
- Adescription of any restrictions on data availability 


- For clinical datasets or third party data, please ensure that the statement adheres to our policy 


Raw and processed sequencing data in the manuscript has been distributed to Gene Expression Omnibus (GEO) under accession number GSE252196. 
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Life sciences study design 


All studies must disclose on these points even when the disclosure is negative. 


Sample size Sample size was not specifically predetermined for the in vitro cell culture experiments. We used independent biological replicates of sample 
size >=2 and each with at least technical duplicates for in vitro molecular experiments. For mouse mutants analysis, the number of analyzed 
mice were indicated in the manuscript. 


Data exclusions _ Tail length measurement in Tbxt-insASAY/insASAY mice excluded one mouse which has a shorter tail but obviously due to injury. 


Replication The molecular experiments for analyzing TBXT splicing in ESCs had been replicated independently for >=3 times. All attempts of the replication 
experiments were consistent with the reported results. The incomplete penetrance of mouse mutant phenotypes was stable. 


Randomization |The mouse experiments and tail length measurement were obtained randomly across multiple litters. 


Blinding The Investigators were not blinded to the mouse experiments as the results were consistent across multiple researchers in pilot experiments. 


Reporting for specific materials, systems and methods 


We require information from authors about some types of materials, experimental systems and methods used in many studies. Here, indicate whether each material, 
system or method listed is relevant to your study. If you are not sure if a list item applies to your research, read the appropriate section before selecting a response. 


Materials & experimental systems Methods 
n/a | Involved in the study n/a | Involved in the study 
Antibodies ChIP-seq 
Eukaryotic cell lines Flow cytometry 
Palaeontology and archaeology MRI-based neuroimaging 


Animals and other organisms 


Human research participants 


Clinical data 


Dual use research of concern 


Eukaryotic cell lines 


Policy information about cell lines 


Cell line source(s) hESCs (H1) were obtained from WiCell Research; mESCs were derived from a male, wild-type C57BL/6J mouse. 


Authentication The hESC line was not authenticated by us, but by the WiCell, which used Short Tandem Repeats (STRs) profiling to 
authenticate the cell lines (protocol available at https://www.wicell.org/home/characterization/identity/short-tandem- 
repeat-str/short-tandem-repeat-str.cmsx). The mESC line was authenticated by its competence for contributing to embryos 
when cultured on feeder cell-dependent condition followed by blastomere injection. 


Mycoplasma contamination All cell lines were tested negative during our routine qPCR-based mycoplasma tests. 


Commonly misidentified lines No commonly misidentified cell lines were used in the study. 
(See ICLAC register) 


Animals and other organisms 


Policy information about studies involving animals; ARRIVE guidelines recommended for reporting animal research 


Laboratory animals Tbxt-Aexon6/+ mouse model (Mus musculus, C57BL/6J) was generated through zygotic injection of CRISPR/Cas9 reagents. Other 
models were generated through blastocyst injection of engineered mESCs. Specially, CS7BL/6J-albino female mice (Charles River 
laboratories, strain#493) were used for harvesting blastocysts. B6D2F1/J (Jackson laboratories, strain#100006) mice were used for 
harvesting blastocysts for fusion to tetraploid blastocysts. Wild-type C57BL/6J (strain#000664) mice were obtained from The Jackson 
laboratory. Mice were housed in the NYU Langone Health BSL1 barrier facility. 5-30 weeks old mice were used for breeding and 
phenotype analysis, with details on the age and sex described at the specific results section. 


Wild animals The study did not involve wild animals. 
Field-collected samples — The study did not involve samples collected from field. 


Ethics oversight The study were performed following NYU Langone Health-approved ethical guidance and regulation on laboratory animals. 


Note that full information on the approval of the study protocol must also be provided in the manuscript. 


> 
jad} 
a 
= 
= 
o 
18) 
fe) 
me 
Oo 
= 
= 
O 
ce) 
fe) 
&. 
=) 
a 
Za) 
= 
S: 
3 
red) 
S 
S 


